Skip to content

Conversation

SkafteNicki
Copy link
Collaborator

@SkafteNicki SkafteNicki commented Aug 27, 2025

What does this PR do?

Fixes #21069

Before submitting
  • Was this discussed/agreed via a GitHub issue? (not for typos and docs)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together?
  • Did you make sure to update the documentation with your changes? (if necessary)
  • Did you write any new necessary tests? (not for typos and docs)
  • Did you verify new and existing tests pass locally with your changes?
  • Did you list all the breaking changes introduced by this pull request?
  • Did you update the CHANGELOG? (not for typos, docs, test updates, or minor internal changes/refactors)

PR review

Anyone in the community is welcome to review the PR.
Before you start reviewing, make sure you have read the review guidelines. In short, see the following bullet-list:

Reviewer checklist
  • Is this pull request ready for review? (if not, please submit in draft mode)
  • Check that all items from Before submitting are resolved
  • Make sure the title is self-explanatory and the description concisely explains the PR
  • Add labels and milestones (and optionally projects) to the PR so it can be classified

📚 Documentation preview 📚: https://pytorch-lightning--21120.org.readthedocs.build/en/21120/

@SkafteNicki SkafteNicki self-assigned this Aug 27, 2025
@SkafteNicki SkafteNicki added the docs Documentation related label Aug 27, 2025
@github-actions github-actions bot added the pl Generic label for PyTorch Lightning package label Aug 27, 2025
Comment on lines +72 to +75
trainer.fit()
├── setup(stage="fit")
│ └── [Callbacks only]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a suggestion. perhaps adding this could also be helpful? as it also shows one of the known issue #19658

    ├── setup(stage="fit")
    │   └── [Callback.setup]
    │   └── [LightnintModule.configure_shared_model]
    │   └── [LightnintModule.configure_model]
    │   └── [Strategy.setup]

Comment on lines +80 to +82
│ └── [Strategy]
├── on_sanity_check_start()
Copy link
Contributor

@GdoongMathew GdoongMathew Aug 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps we could also add when the checkpoint and state_dict are load?

on_fit_start()

if strategy.restore_checkpoint_after_setup:
    [LightningModule.on_load_checkpoint]
    [Strategy.load_model_state_dict()

[optimizer.load_state_dict]
[lr_scheduler.load_state_dict]

on_sanity_check_start()

Comment on lines +209 to +210
└── teardown(stage="fit")
└── [Callbacks only]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that the lightning module also has a teardown callback here.

    └── teardown(stage="fit")
        └── [Callback]
        └── [LightningModule]

Comment on lines +204 to +209
├── on_fit_end()
│ ├── [Callbacks]
│ ├── [LightningModule]
│ └── [Strategy]
└── teardown(stage="fit")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from my personal code inspect, I believe that the strategy.teardown should be called before on_fit_end stage, and also the strategy class does not have a on_fit_end callback.

strategy.teardown()
     |--LightningModule.cpu()
on_fit_end()
     |--Callback.on_fit_end()
     |--LightningModule.on_fit_end()
Callback.teardown()
LightningModule.teardown()

Comment on lines +135 to +140
│ │ ├── on_before_zero_grad()
│ │ │ ├── [Callbacks]
│ │ │ └── [LightningModule]
│ │ │
│ │ ├── [Forward Pass - training_step()]
│ │ │ └── [Strategy only]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the training step is called before the on_before_zero_grad is called?

strategy.training_step()
    | -- LightningModule.training_step()
on_before_zero_grad()
    |-- Callback.on_before_zero_grad()
    |-- LightningModule.on_before_zero_grad()
    |-- LightningModule.optimizer_zero_grad()

Comment on lines +153 to +155
│ │ ├── on_before_optimizer_step()
│ │ │ ├── [Callbacks]
│ │ │ └── [LightningModule]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps also add configure_gradient_clipping here?

    │   │   ├── on_before_optimizer_step()
    │   │   │   ├── [Callbacks]
    │   │   │   ├── [LightningModule]
    │   │   │   └── [LightningModule.configure_gradient_clipping]

Comment on lines +193 to +195
│ ├── [Callbacks - Non-monitoring only]
│ ├── [LightningModule]
│ └── [Callbacks - Monitoring only]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps Non-monitoring could be changed to Checkpoint & EarlyStopping to be more specific.

@Borda
Copy link
Collaborator

Borda commented Aug 27, 2025

@GdoongMathew, all good call. How about landing this as a baseline, and you can send just another PR with your suggestions for improvement? :)

@GdoongMathew
Copy link
Contributor

From my previous experience, since Lightning uses the closure function in the optimizer, it can make tracing the optimizer step a bit difficult. Maybe we could also highlight that here?

            # closure function for optimizer
            strategy.training_step()
            └── LightningModule.training_step()

            callback.on_before_zero_grad()
            LightningModule.on_before_zero_grad()
            LightningModule.optimizer_zero_grad()
            strategy.backward()
            ├── callback.on_before_backward()
            ├── LightningModule.on_before_backward()
            ├── LightningModule.backward()
            │   └── loss.backward()
            ├── callback.on_after_backward()
            └── LightningModule.on_after_backward()

            callback.on_before_optimizer_step()
            LightningModule.on_before_optimizer_step()
            LightningModule.configure_gradient_clipping()
    
            optimizer.step(closure)

but this also makes my personal note a bit messier as well....

@GdoongMathew
Copy link
Contributor

@Borda sure!! will open another PR for those suggestions. thanks~

@Borda

This comment was marked as off-topic.

@Borda Borda requested a review from GdoongMathew September 1, 2025 11:32
@Borda Borda merged commit 3d56296 into Lightning-AI:master Sep 1, 2025
24 checks passed
@Borda
Copy link
Collaborator

Borda commented Sep 1, 2025

@GdoongMathew, it has landed, so now you can send your suggestions with improvments 🚀

Borda added a commit that referenced this pull request Sep 3, 2025
* add hook order
* add to index
* Apply suggestions from code review
* BoringModel
* testoutput

---------

Co-authored-by: Jirka Borovec <[email protected]>
Co-authored-by: Jirka B <[email protected]>
(cherry picked from commit 3d56296)
lantiga pushed a commit that referenced this pull request Sep 5, 2025
* add hook order
* add to index
* Apply suggestions from code review
* BoringModel
* testoutput

---------

Co-authored-by: Jirka Borovec <[email protected]>
Co-authored-by: Jirka B <[email protected]>
(cherry picked from commit 3d56296)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs Documentation related pl Generic label for PyTorch Lightning package

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve docs for Callbacks on access to all batch outputs

3 participants